Chapter 9 - ARIMA Models
Forecasting: Principles and Practice

R. J. Serrano

05/12/2023

ARIMA models —

Learning objectives:

Introduction —

ARIMA models provide another approach to time series forecasting. While exponential smoothing models are based on a description of the trend and seasonality in the data, ARIMA models aim to describe the autocorrelations in the data.

9.1 - Stationarity and differencing —

A stationary time series is one whose statistical properties do not depend on the time at which the series is observed.

Thus, a stationary time series exhibits the following characteristics:

Source: What is stationarity? https://youtu.be/aIdTGKjQWjA

Figure 9.1: Which of this series are stationary?

Differencing

In Figure 9.1, note that the Google stock price was non-stationary in panel (a), but the daily changes were stationary in panel (b). This shows one way to make a non-stationary time series stationary — compute the differences between consecutive observations. This is known as differencing.

google_2018 <- gafa_stock |>
  filter(Symbol == "GOOG", year(Date) == 2018)

# original time series
google_2018 |> 
     autoplot(Adj_Close) + 
     labs(title = 'Google Closing Stock Price ($USD)')

google_2018 |> ACF(Close) |>
  autoplot() + labs(subtitle = "Google closing stock price")

google_2018 |> ACF(difference(Close)) |>
  autoplot() + labs(subtitle = "Changes in Google closing stock price")

google_2018 |>
  mutate(diff_close = difference(Close)) |>
  features(diff_close, ljung_box, lag = 8)

The ACF of the differenced Google stock price looks just like that of a white noise series. Since the p-value is greater than 0.05, this suggests that the daily change in the Google stock price is essentially a random amount which is uncorrelated with that of previous days.

Random walk model —-

A time series is said to follow a random walk process if the predicted value of the series in one period is equivalent to the value of the series in the previous period plus a random error. Source: https://analystprep.com/study-notes/cfa-level-2/random-walk-process/

If differenced series is white noise with zero mean:

where \(\varepsilon_t \sim NID(0,\sigma^2)\).

Random walk models are widely used for non-stationary data, particularly financial and economic data. Random walks typically have:

Seasonal differencing —-

A seasonal difference is the difference between an observation and the corresponding observation from the previous year.\[ y'_t = y_t - y_{t-m} \] Antidiabetic drug sales

a10 <- PBS |>
  filter(ATC2 == "A10") |>
  summarise(Cost = sum(Cost) / 1e6)

a10 |> autoplot(
  Cost
) + 
     labs(title = 'Austrialia monthly scripts for A10 (antidiabetic) drugs sold')

a10 |> autoplot(
  log(Cost)
) + 
     labs(title = 'Austrialia monthly scripts for A10 (antidiabetic) drugs sold')

a10 |> autoplot(
  log(Cost) |> difference(12)
) + 
     labs(title = 'Annual change in monthly scripts for A10 (antidiabetic) drugs sold')

Unit root test —-

One way to determine more objectively whether differencing is required is to use a unit root test. These are statistical hypothesis tests of stationarity that are designed for determining whether differencing is required.

Source: textbook Chapter 9.1

For example, let us apply it to the Google stock price data.

google_2018 |>
  features(Close, unitroot_kpss)

The p-value is less than 0.05, so the null hypothesis (time series is stationary) is rejected in favor of the alternate hypothesis (time series is NOT stationary).

We can difference the time series and apply the test again.

google_2018 |>
  mutate(diff_close = difference(Close)) |>
  features(diff_close, unitroot_kpss)

This time, the p-value is greater than 0.05, so we can conclude that the time series appears stationary.

Exercise:

For the tourism dataset, compute the total number of trips and find an appropriate differencing (after transformation if necessary) to obtain stationary data.

9.2 - Backshift notation

A very useful notational device is the backward shift operator, \(B\), which is used as follows: \[ B y_{t} = y_{t - 1} \]

In other words, \(B\), operating on \(y_{t}\), has the effect of shifting the data back one period.

Two applications of \(B\) to \(y_{t}\) shifts the data back two periods: \[ B(By_{t}) = B^{2}y_{t} = y_{t-2} \]

9.3 - Autoregressive models

In a multiple regression model, introduced in Chapter 7, we forecast the variable of interest using a linear combination of predictors. In an autoregression model, we forecast the variable of interest using a linear combination of past values of the variable. The term autoregression indicates that it is a regression of the variable against itself.

Thus, an autoregressive model of order p can be written as

\[\begin{equation} y_t=c+\phi_1 y_{t-1}+\phi_2 y_{t-2}+\cdots+\phi_p y_{t-p}+\varepsilon_t \end{equation}\]

where \(ε_t\) is white noise. This is like a multiple regression but with lagged values of \(y_t\) as predictors. We refer to this as an AR(\(p\)) model, an autoregressive model of order \(p\).

For an AR(1) model:

9.4 - Moving average models

Rather than using past values of the forecast variable in a regression, a moving average model uses past forecast errors in a regression-like model,

\[\begin{equation} y_t=c+\varepsilon_t+\theta_1 \varepsilon_{t-1}+\theta_2 \varepsilon_{t-2}+\cdots+\theta_q \varepsilon_{t-q} \end{equation}\]

where \(ε_t\) is white noise. We refer to this as an MA(\(q\)) model, a moving average model of order \(q\). Of course, we do not observe the values of \(ε_t\), so it is not really a regression in the usual sense.

Invertibility

9.5 - Non-seasonal ARIMA models

If we combine differencing with autoregression and a moving average model, we obtain a non-seasonal ARIMA model. ARIMA is an acronym for AutoRegressive Integrated Moving Average (in this context, “integration” is the reverse of differencing).

The full model can be written as :

\[\begin{equation} y_t^{\prime}=c+\phi_1 y_{t-1}^{\prime}+\cdots+\phi_p y_{t-p}^{\prime}+\theta_1 \varepsilon_{t-1}+\cdots+\theta_q \varepsilon_{t-q}+\varepsilon_t \end{equation}\]

where \(y′_t\) is the differenced series (it may have been differenced more than once). The “predictors” on the right hand side include both lagged values of \(y_t\) and lagged errors.

Example: Egyptian exports

global_economy |>
     filter(Code == "EGY") |>
     autoplot(Exports) +
     labs(y = "% of GDP", title = "Egyptian exports")

The following R code selects a non-seasonal ARIMA model automatically.

fit <- global_economy |>
     filter(Code == "EGY") |>
     model(ARIMA(Exports))

report(fit)
## Series: Exports 
## Model: ARIMA(2,0,1) w/ mean 
## 
## Coefficients:
##          ar1      ar2      ma1  constant
##       1.6764  -0.8034  -0.6896    2.5623
## s.e.  0.1111   0.0928   0.1492    0.1161
## 
## sigma^2 estimated as 8.046:  log likelihood=-141.57
## AIC=293.13   AICc=294.29   BIC=303.43

Fitted model residual plots

fit |> 
     gg_tsresiduals()

Forecast for 10 years (periods)

fit |> forecast(h=10) |>
     autoplot(global_economy) +
     labs(y = "% of GDP", title = "Egyptian exports")

ACF and PACF plots

It is usually not possible to tell, simply from a time plot, what values of \(p\) and \(q\) are appropriate for the data. However, it is sometimes possible to use the ACF plot, and the closely related PACF plot, to determine appropriate values for \(p\) and \(q\).

Interpreting ACF and PACF Plots for Time Series Forecasting

global_economy |>
     filter(Code == "EGY") |>
     gg_tsdisplay(Exports, plot_type = "partial")

We can detect a decaying sinusoidal pattern in the ACF, and the PACF shows the last significant spike at lag 4. This is what you would expect from an ARIMA(4,0,0) model.

fit2 <- global_economy |>
     filter(Code == "EGY") |>
     model(ARIMA(Exports ~ pdq(4,0,0)))

report(fit2)
## Series: Exports 
## Model: ARIMA(4,0,0) w/ mean 
## 
## Coefficients:
##          ar1      ar2     ar3      ar4  constant
##       0.9861  -0.1715  0.1807  -0.3283    6.6922
## s.e.  0.1247   0.1865  0.1865   0.1273    0.3562
## 
## sigma^2 estimated as 7.885:  log likelihood=-140.53
## AIC=293.05   AICc=294.7   BIC=305.41

This model is only slightly worse than the ARIMA(2,0,1) model identified by ARIMA() (with an AICc value of 294.70 compared to 294.29).

9.7 - ARIMA modelling in fable

The ARIMA() function in the fable package uses a variation of the Hyndman-Khandakar algorithm (Hyndman & Khandakar, 2008), which combines unit root tests, minimisation of the AICc and MLE to obtain an ARIMA model. The arguments to ARIMA() provide for many variations on the algorithm. What is described here is the default behaviour.

Modelling procedure

When fitting an ARIMA model to a set of (non-seasonal) time series data, the following procedure provides a useful general approach.

Source: Figure 9.12

Portmanteau tests of residuals for ARIMA models

With ARIMA models, more accurate portmanteau tests are obtained if the degrees of freedom of the test statistic are adjusted to take account of the number of parameters in the model. Specifically, we use ℓ−\(K\) degrees of freedom in the test, where \(K\) is the number of AR and MA parameters in the model. So for the non-seasonal models, we have considered so far, \(K\)=\(p\)+\(q\). The value of \(K\) is passed to the ljung_box function via the argument dof, as shown in the example below.

Example: Central Africa Republic exports

global_economy |>
     filter(Code == "CAF") |>
     autoplot(Exports) +
     labs(title="Central African Republic exports",
          y="% of GDP")

  1. The time plot shows some non-stationarity, with an overall decline. The improvement in 1994 was due to a new government which overthrew the military junta and had some initial success, before unrest caused further economic decline.

  2. There is no evidence of changing variance, so we will not do a Box-Cox transformation.

  3. To address the non-stationarity, we will take a first difference of the data.

global_economy |>
     filter(Code == "CAF") |>
     gg_tsdisplay(difference(Exports), plot_type='partial')

  1. The PACF shown in the above figure is suggestive of an AR(2) model; so an initial candidate model is an ARIMA(2,1,0). The ACF suggests an MA(3) model; so an alternative candidate is an ARIMA(0,1,3).

  2. We fit both an ARIMA(2,1,0) and an ARIMA(0,1,3) model along with two automated model selections, one using the default stepwise procedure, and one working harder to search a larger model space.

caf_fit <- global_economy |>
     filter(Code == "CAF") |>
     model(arima210 = ARIMA(Exports ~ pdq(2,1,0)),
           arima013 = ARIMA(Exports ~ pdq(0,1,3)),
           stepwise = ARIMA(Exports),
           search = ARIMA(Exports, stepwise=FALSE))

caf_fit |> pivot_longer(!Country, names_to = "Model name",
                         values_to = "Orders")
caf_fit |> 
     glance() |> 
     arrange(AICc) |> 
     select(.model:BIC)

The four models have almost identical AICc values. Of the models fitted, the full search has found that an ARIMA(3,1,0) gives the lowest AICc value, closely followed by the ARIMA(2,1,0) and ARIMA(0,1,3) — the latter two being the models that we guessed from the ACF and PACF plots. The automated stepwise selection has identified an ARIMA(2,1,2) model, which has the highest AICc value of the four models.

ARIMA parameters for search model.

caf_fit |> 
     tidy() |> 
     filter(.model == 'search')
  1. The ACF plot of the residuals from the ARIMA(3,1,0) model shows that all autocorrelations are within the threshold limits, indicating that the residuals are behaving like white noise.
caf_fit |>
     select(search) |>
     gg_tsresiduals()

A portmanteau test (setting K=3) returns a large p-value, also suggesting that the residuals are white noise.

caf_fit |> 
     augment() |> 
     filter(.model=='search') |>
     features(.innov, ljung_box, lag = 10, dof = 3)

Forecast for 5 years (periods)

caf_fit |>
     forecast(h=5) |>
     filter(.model=='search') |>
     autoplot(global_economy)

Note that the mean forecasts look very similar to what we would get with a random walk (equivalent to an ARIMA(0,1,0)). The extra work to include AR and MA terms has made little difference to the point forecasts in this example, although the prediction intervals are much narrower than for a random walk model.

9.8 - Forecasting

Although we have calculated forecasts from the ARIMA models in our examples, we have not yet explained how they are obtained. Point forecasts can be calculated using the following three steps.

  1. Expand the ARIMA equation so that \(y_{t}\) is on the left hand side and all other terms are on the right.

  2. Rewrite the equation by replacing \(t\) with \(T\) + \(h\).

  3. On the right hand side of the equation, replace future observations with their forecasts, future errors with zero, and past errors with the corresponding residuals.

Source: (9.5)

Source: (9.5)

9.9 - Seasonal ARIMA models

So far, we have restricted our attention to non-seasonal data and non-seasonal ARIMA models. However, ARIMA models are also capable of modelling a wide range of seasonal data.

A seasonal ARIMA model is formed by including additional seasonal terms in the ARIMA models we have seen so far. It is written as follows:

Source: Chapter 9.9

Example: Monthly US leisure and hospitality employment

We will describe seasonal ARIMA modelling using monthly US employment data for leisure and hospitality jobs from January 2001 to September 2019.

leisure <- us_employment |>
     filter(Title == "Leisure and Hospitality",
            year(Month) > 2000) |>
     mutate(Employed = Employed/1000) |>
     select(Month, Employed)

leisure |> 
     autoplot(Employed) + 
     geom_smooth(method = 'loess', se = FALSE, color = 'steelblue') + 
     labs(title = "US employment: leisure and hospitality",
          y="Number of people (millions)")

The data are clearly non-stationary, with strong seasonality and a nonlinear trend, so we will first take a seasonal difference. The seasonally differenced data is shown below:

leisure |>
     gg_tsdisplay(difference(Employed, 12),
                  plot_type='partial', lag=36) +
     labs(title="Seasonally differenced", y="")

These are also clearly non-stationary, so we take a further first difference.

leisure |>
  gg_tsdisplay(Employed |> 
                    difference(12) |> 
                    difference(), 
               plot_type='partial', lag=36) +
  labs(title = "Double differenced", y="")

Let’s try three ARIMA models from ACF/PACF/Auto ARIMA.

fit <- leisure |>
     model(
          arima012011 = ARIMA(Employed ~ pdq(0,1,2) + PDQ(0,1,1)),
          arima210011 = ARIMA(Employed ~ pdq(2,1,0) + PDQ(0,1,1)),
          auto = ARIMA(Employed, stepwise = FALSE, approx = FALSE)
     )

fit |> 
     pivot_longer(everything(), 
                  names_to = "Model name", 
                  values_to = "Orders")
fit |> 
     glance() |> 
     arrange(AICc) |> 
     select(.model:BIC)

The three fitted models have similar AICc values, with the automatically selected model being a little better.

ARIMA parameters for auto model.

fit |> 
     tidy() |> 
     filter(.model == 'auto')

Residuals plots

fit |> 
     select(auto) |> 
     gg_tsresiduals(lag=36)

One small but significant spike (at lag 11) out of 36 is still consistent with white noise. To be sure, we use a Ljung-Box test, being careful to set the degrees of freedom to match the number of parameters in the model.

fit |> 
     augment() |> 
     filter(.model == "auto") |> 
     features(.innov, ljung_box, lag=24, dof=4)

The large p-value (greater than 0.05) confims that the residuals are similar to white noise.

Forecast for next 36 months (periods)

forecast(fit, h=36) |>
     filter(.model=='auto') |>
     autoplot(leisure) +
     labs(title = "US employment: leisure and hospitality",
          y="Number of people (millions)")

9.10 - ARIMA vs ETS

Source: Chapter 9.10

Source: Figure 9.27

Source: Table 9.4

Comparing ARIMA() and ETS() on non-seasonal data

We can use time series cross-validation to compare ARIMA and ETS models. Let’s consider the Australian population from the global_economy dataset.

aus_economy <- global_economy |>
     filter(Code == "AUS") |>
     mutate(Population = Population/1e6)

aus_economy |>
     slice(-n()) |>
     stretch_tsibble(.init = 10) |>
     model(
          ETS(Population),
          ARIMA(Population)
     ) |>
     forecast(h = 1) |>
     accuracy(aus_economy) |>
     select(.model, RMSE:MAPE)

In this case the ETS model has higher accuracy on the cross-validated performance measures. Below we generate and plot forecasts for the next 5 years generated from an ETS model.

aus_economy |>
     model(ETS(Population)) |>
     forecast(h = "5 years") |>
     autoplot(aus_economy |> filter(Year >= 2000)) +
     labs(title = "Australian population",
          y = "People (millions)")

Comparing ARIMA() and ETS() on seasonal data

In this case we want to compare seasonal ARIMA and ETS models applied to the quarterly cement production data (from aus_production).

cement <- aus_production |>
     select(Cement) |>
     filter_index("1988 Q1" ~ .)

train <- cement |> filter_index(. ~ "2007 Q4")

The output below shows the model selected and estimated by ARIMA(). The ARIMA model does well in capturing all the dynamics in the data as the residuals seem to be white noise.

fit_arima <- train |> model(ARIMA(Cement))
report(fit_arima)
## Series: Cement 
## Model: ARIMA(1,0,1)(2,1,1)[4] w/ drift 
## 
## Coefficients:
##          ar1      ma1   sar1     sar2     sma1  constant
##       0.8886  -0.2366  0.081  -0.2345  -0.8979    5.3884
## s.e.  0.0842   0.1334  0.157   0.1392   0.1780    1.4844
## 
## sigma^2 estimated as 11456:  log likelihood=-463.52
## AIC=941.03   AICc=942.68   BIC=957.35

Residuals plots

fit_arima |> 
     gg_tsresiduals(lag_max = 16)

fit_arima |> 
     augment() |> 
     features(.innov, ljung_box, lag = 16, dof = 5)

The output below also shows the ETS model selected and estimated by ETS(). This model also does well in capturing all the dynamics in the data, as the residuals similarly appear to be white noise.

fit_ets <- train |> model(ETS(Cement))
report(fit_ets)
## Series: Cement 
## Model: ETS(M,N,M) 
##   Smoothing parameters:
##     alpha = 0.7533714 
##     gamma = 0.0001000093 
## 
##   Initial states:
##      l[0]     s[0]    s[-1]    s[-2]     s[-3]
##  1694.712 1.031179 1.045209 1.011424 0.9121874
## 
##   sigma^2:  0.0034
## 
##      AIC     AICc      BIC 
## 1104.095 1105.650 1120.769
fit_ets |>
  gg_tsresiduals(lag_max = 16)

fit_ets |> 
     augment() |> 
     features(.innov, ljung_box, lag = 16)

The output below evaluates the forecasting performance of the two competing models over the test set. In this case the ARIMA model seems to be the slightly more accurate model based on the test set RMSE, MAPE and MASE.

# Generate forecasts and compare accuracy over the test set
bind_rows(
     fit_arima |> accuracy(),
     fit_ets |> accuracy(),
     fit_arima |> forecast(h = 10) |> accuracy(cement),
     fit_ets |> forecast(h = 10) |> accuracy(cement)
  ) |>
     select(-ME, -MPE, -ACF1)

Below we generate and plot forecasts from the ARIMA model for the next 3 years.

cement |>
     model(ARIMA(Cement)) |>
     forecast(h="3 years") |>
     autoplot(cement) +
     labs(title = "Cement production in Australia",
          y = "Tonnes ('000)")